Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing created_at with the block_timestamp #9728

Closed
wants to merge 2 commits into from

Conversation

mateusz-sekara
Copy link
Collaborator

@mateusz-sekara mateusz-sekara commented Jun 30, 2023

There are a couple of issues with current approach in which we perform ordering by the created_at

Sorting by created_at is done in memory

explain analyze SELECT * FROM evm_logs
                        WHERE evm_chain_id = 1
                        AND address = '\xbb0f07700587fee22c5863ed75a5e05b68fcac4763449a582987fa4e8e9a2d0f'
                        AND event_sig = '\xa32c5f1d88735034143171f18fbb4d447fbbe0fbf4c98733d54092a081ba5d2a'
                        AND created_at > '2010-10-10'
                       AND (block_number) <=  (SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = 1 order by block_number desc LIMIT 1)
                        ORDER BY created_at ASC;

 Gather Merge  (cost=63063.08..73999.94 rows=93738 width=235) (actual time=114.722..125.164 rows=33413 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.57 rows=1 width=16) (actual time=0.038..0.039 rows=1 loops=1)
           ->  Index Only Scan Backward using log_poller_blocks_pkey on evm_log_poller_blocks  (cost=0.43..96592.10 rows=671267 width=16) (actual time=0.036..0.036 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '1'::numeric)
                 Heap Fetches: 1
   ->  Sort  (cost=62062.48..62179.65 rows=46869 width=235) (actual time=105.857..107.012 rows=11138 loops=3)
         Sort Key: evm_logs.created_at
         Sort Method: external merge  Disk: 5040kB
         Worker 0:  Sort Method: external merge  Disk: 2112kB
         Worker 1:  Sort Method: external merge  Disk: 2112kB
         ->  Parallel Seq Scan on evm_logs  (cost=0.00..53137.80 rows=46869 width=235) (actual time=29.668..98.472 rows=11138 loops=3)
               Filter: ((created_at > '2010-10-10 00:00:00+00'::timestamp with time zone) AND (block_number <= $0) AND (evm_chain_id = '1'::numeric) AND (address = '\xbb0f07700587fee22c5863ed75a5e05b68fcac4763449a582987fa4e8e9a2d0f'::bytea) AND (event_sig = '\xa32c5f1d88735034143171f18fbb4d447fbbe0fbf4c98733d54092a081ba5d2a'::bytea))
               Rows Removed by Filter: 388862
 Planning Time: 0.456 ms
 Execution Time: 127.522 ms
(19 rows)

This query can be optimized a little by adding other columns to the index with includes or just returning columns from the index (not need to perform heap scan). However, the duration ends up around 100ms, with sorting still being run in memory.

created_at is initialized during runtime by picking the current time with NOW() function. Those times might differ between nodes causing nodes not reaching a consensus because they might get completely different logs from the log poller.

Current insert implementation

err := q.ExecQNamed(`INSERT INTO evm_logs 
(evm_chain_id, log_index, block_hash, block_number, block_timestamp, address, event_sig, topics, tx_hash, data, created_at) VALUES 
(:evm_chain_id, :log_index, :block_hash, :block_number, :block_timestamp, :address, :event_sig, :topics, :tx_hash, :data, NOW()) ON CONFLICT DO NOTHING`, logs[start:end])

Solution

Instead of sorting by date, let's sort by the block number. We can run one small subquery that picks block_number based on the timestamp and use it for filtering. This should save us some latency. What is more, instead of using create_at we should move to the block_timestamp which is the value created on chain so every node running code should get the same values

Query plans

Subquery picking the block_number based on the timestamp

chainlink_test=# explain analyze SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = 1 and block_timestamp > '2010-10-10' ORDER BY block_number LIMIT 1;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..0.78 rows=1 width=16) (actual time=0.071..0.072 rows=1 loops=1)
   ->  Index Scan using log_poller_blocks_pkey on evm_log_poller_blocks  (cost=0.56..728781.59 rows=3328667 width=16) (actual time=0.069..0.069 rows=1 loops=1)
         Index Cond: (evm_chain_id = '1'::numeric)
         Filter: (block_timestamp > '2010-10-10 00:00:00+00'::timestamp with time zone)
 Planning Time: 0.317 ms
 Execution Time: 0.111 ms

Entire query

chainlink_test=# explain analyze SELECT * FROM evm_logs
                        WHERE evm_chain_id = 1
                        AND address = '\xbb0f07700587fee22c5863ed75a5e05b68fcac4763449a582987fa4e8e9a2d0f'
                        AND event_sig = '\xa32c5f1d88735034143171f18fbb4d447fbbe0fbf4c98733d54092a081ba5d2a'
                        AND block_number >= (SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = 1 and block_timestamp > '2010-10-10' ORDER BY block_number LIMIT 1)
                        AND (block_number) <=  (SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = 1 order by block_number desc LIMIT 1)
                        ORDER BY block_number ASC;

----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5975.62..5979.84 rows=1687 width=235) (actual time=57.351..61.167 rows=33413 loops=1)
   Sort Key: evm_logs.block_number
   Sort Method: external merge  Disk: 9240kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.57 rows=1 width=16) (actual time=0.055..0.056 rows=1 loops=1)
           ->  Index Scan using log_poller_blocks_pkey on evm_log_poller_blocks  (cost=0.43..98270.26 rows=671200 width=16) (actual time=0.053..0.053 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '1'::numeric)
                 Filter: (block_timestamp > '2010-10-10 00:00:00+00'::timestamp with time zone)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.43..0.57 rows=1 width=16) (actual time=0.026..0.026 rows=1 loops=1)
           ->  Index Only Scan Backward using log_poller_blocks_pkey on evm_log_poller_blocks evm_log_poller_blocks_1  (cost=0.43..96592.10 rows=671267 width=16) (actual time=0.026..0.026 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '1'::numeric)
                 Heap Fetches: 1
   ->  Bitmap Heap Scan on evm_logs  (cost=216.06..5884.05 rows=1687 width=235) (actual time=17.892..33.581 rows=33413 loops=1)
         Recheck Cond: ((evm_chain_id = '1'::numeric) AND (block_number >= $0) AND (block_number <= $1) AND (address = '\xbb0f07700587fee22c5863ed75a5e05b68fcac4763449a582987fa4e8e9a2d0f'::bytea) AND (event_sig = '\xa32c5f1d88735034143171f18fbb4d447fbbe0fbf4c98733d54092a081ba5d2a'::bytea))
         Heap Blocks: exact=3705
         ->  Bitmap Index Scan on evm_logs_idx  (cost=0.00..215.64 rows=1687 width=0) (actual time=17.368..17.368 rows=33413 loops=1)
               Index Cond: ((evm_chain_id = '1'::numeric) AND (block_number >= $0) AND (block_number <= $1) AND (address = '\xbb0f07700587fee22c5863ed75a5e05b68fcac4763449a582987fa4e8e9a2d0f'::bytea) AND (event_sig = '\xa32c5f1d88735034143171f18fbb4d447fbbe0fbf4c98733d54092a081ba5d2a'::bytea))
 Planning Time: 0.595 ms
 Execution Time: 64.111 ms

@github-actions
Copy link
Contributor

I see that you haven't updated any CHANGELOG files. Would it make sense to do so?

Comment on lines 474 to 476
AND block_number >= (SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = $1 and block_timestamp > $6 ORDER BY block_number LIMIT 1)
AND (block_number + $7) <= (SELECT COALESCE(block_number, 0) FROM evm_log_poller_blocks WHERE evm_chain_id = $1 ORDER BY block_number DESC LIMIT 1)
ORDER BY created_at ASC`, utils.NewBig(o.chainID), address, eventSig.Bytes(), topicIndex+1, pq.ByteaArray(topicValuesBytes), after, confs)
ORDER BY block_number ASC`, utils.NewBig(o.chainID), address, eventSig.Bytes(), topicIndex+1, pq.ByteaArray(topicValuesBytes), after, confs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand it might be tricky, can we have a test case for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsondav please take a look, wrote a couple of tests. Additionally, I reverted changes from orm.go and run those tests against the old implementation - they did not pass, because of created_at behavior - which is expected ;)

samsondav
samsondav previously approved these changes Jul 10, 2023
reductionista
reductionista previously approved these changes Jul 10, 2023
Copy link
Contributor

@reductionista reductionista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to say we should change the name to SelectLogsAfterBlockTimestamp to avoid confusion, but I guess probably best to leave as-is to avoid an unnecessary API change.

@mateusz-sekara
Copy link
Collaborator Author

I'm tempted to say we should change the name to SelectLogsAfterBlockTimestamp to avoid confusion, but I guess probably best to leave as-is to avoid an unnecessary API change.

We can give it a try in a separate PR ;)

samsondav
samsondav previously approved these changes Jul 12, 2023
@cl-sonarqube-production
Copy link

…h other log poller functions (block_no, log_index)
@cl-sonarqube-production
Copy link

@mateusz-sekara mateusz-sekara deleted the fix/log-poller-created-at branch September 19, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants